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Abstract 

In this paper, we consider a bilevel polynomial optimization problem where the objective and the 
constraint functions of both the upper and the lower level problems are polynomials. We present 
methods for finding its global minimizers and global minimum using a sequence of semidefinite 
programming (SDP) relaxations and provide convergence results for the methods. Our scheme for 
problems with a convex lower-level problem involves solving a transformed equivalent single-level 
problem by a sequence of SDP relaxations; whereas our approach for general problems involving 
a non-convex polynomial lower-level problem solves a sequence of approximation problems via 
another sequence of SDP relaxations. 

Key words: Bilevel programming, global optimization, polynomial optimization, semidefinite 
programming hierarchies. 

1 Introduction 


Consider the bilevel polynomial optimization problem 


(P) min 

xeiR",i/eiR’" 

subject to 


f{x,y) 

gi{x,y) <0, t = 1,... ,s, 


y eY{x) := argmin^g]R-{G'(a:,u;) : hj{w) < 0,j = 1,... ,r}, 
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where / : M"" x —)■ M, : M”" x M"* —)■ M, G : M"" x —)• M and /ij : M"* —)■ M are all polynomials 

with real coefficients, and we make the blanket assumption that the feasible set of (P) is nonempty, 
that is, {(x,y) G M" x : gi{x,y) < 0,i = 1,. .., s, y G / 0- 


Bilevel optimization provides mathematical models for hierarchical decision making processes where 
the follower’s decision depends on the leader’s decision. More precisely, if x and y are the decision 
variables of the leader and the follower respectively then the problem (P) represents the so-called 
optimistic approach to the leader and follower’s game in which the follower is assumed to be co¬ 
operative and so, the leader can choose the solution with the lowest cost. We note that, there is another 
approach, called pessimistic approach, which assumes that the follower may not be co-operative and 
hence the leader will need to prepare for the worst cost (see for example [111 I44j l. 


The bilevel optimization problem (P) also requires that the constraints of the lower level problem 
are independent of the upper level decision variable x (i.e. the functions hj do not depend on x). 
This independence assumption guarantees that the optimal value function of the lower level problem 
is automatically continuous, and so, plays an important role later in establishing convergence of our 
proposed approximation schemes for finding global optimal solutions of (P). A discussion on this 
assumption and its possible relaxation is given in Remark 4.9 in Section 4 of the paper. 


As noted in [H], the models of the form (P) cover the situations in which the leader can only observe 
the outcome of the follower’s action but not the action itself, and so, has important applications in 
economics such as the so-called moral hazard model of the principal-agent problem. In particular, 
in the special case where gi depends only on x, the sets {x G M” : gi{x) < 0} and {tc G : 
hj{w) < 0} are both convex sets, problem (P) has been studied in [3T] and a smoothing projected 
gradient algorithm has been proposed to find a stationary point of problem (P). On the other hand, 
the functions f,gi,G,hj of (P) in [3T] are allowed to be continuously differentiable functions which 
may not be polynomials in general. For applications and recent developments of solving more general 
bilevel optimization problems, see [3 El uni Ellis]- 


In this paper, in the interest of simplicity, we focus on the optimistic approach to the hierarchical 
decision making process and develop methods for finding a global minimizer and global minimum of 
(P). We make the following key contributions to bilevel optimization. 

• A novel SDP hierarchy for bilevel polynomial problems. We propose general pur¬ 
pose schemes for finding global solutions of the bilevel polynomial optimization problem (P) 
by solving hierarchies of semidefinite programs and establish convergence of the schemes. Our 
approach makes use of the known techniques of bilevel optimization and the recent develop¬ 
ments of (single-level) polynomial optimization, such as the sums-of-squares decomposition and 
semidefinite programming hierarchy, and does not use any discretization or branch-and-bound 
techniques as in [ElETlIlll. 

• Convex lower-level problems: Convergence to global solutions. We first transform the 
bilevel polynomial optimization problem (P) with a convex lower-level problem into an equivalent 
single level nonconvex polynomial optimization problem. We show that the values of the standard 
semidefinite programming relaxations of the transformed single level problem converge to the 
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global optimal value of the bilevel problem (P) under a technical assumption that is commonly 
used in polynomial optimization (see |26j and other references therein). 

• Non-convex lower-level problems: A new convergent approximation scheme. By ex¬ 
amining a sequence of e-approximation (single-level) problems of the bilevel problem (P) with a 
not necessarily convex lower level problem, we present another convergent sequence of SDP relax¬ 
ations of (P) under suitable conditions. Our approach extends the sequential SDP relaxations, 
introduced in m for parameterized single-level polynomial problems, to bilevel polynomial op¬ 
timization problems. 

It is important to note that local bilevel optimization techniques, studied extensively in the literature 
mm, apply to broad classes of nonlinear bilevel optimization problems. In the present work, we 
employ some basic tools and techniques of semi-algebraic geometry to achieve convergence of our 
semidefinite programming hierarchies of global nonlinear bilevel optimization problems, and so our 
approaches are limited to studying the class of polynomial bilevel optimization problems. 

Moreover, due to the limitation of the SDP programming solvers, our proposed scheme can be used 
to solve problems with small or moderate size and it may not be able to compete with the ad-hoc 
(but computationally tractable) techniques, such as branch-and-bound methods and discretization 
schemes. For instance, underestimation and branch-and-bound techniques were used in mCZlET! 
and a generalized semi-infinite programming reformulation together with a discretization technique 
was employed in |44) . See http://bilevel.org/ for other references of computational methods of bilevel 
optimization. 

However, it has recently been shown that, by exploiting sparsity and symmetry, large size problems 
can be solved efficiently and various numerical packages have been built to solve real-life problems 
such as the sensor network localization problem |24) . We leave the study of solving large size bilevel 
problems for future research as it is beyond the scope of this paper. 

The outline of the paper is as follows. Section 2 gives preliminary results on polynomials and conti¬ 
nuity properties of the solution map of the lower-level problem of (P). Section 3 presents convergence 
of our sequential SDP relaxation scheme for solving the problem (P) with a convex lower-level prob¬ 
lem. Section 4 describes another sequential SDP relaxation scheme and its convergence for solving the 
general problem (P) with a not necessarily convex lower-level problem. Section 5 reports results of 
numerical implementations of the proposed methods for solving some bilevel optimization test prob¬ 
lems. The appendix provides details of various technical results of semi-algebraic geometry used in 
the paper and also proofs of certain technical results. 


2 Preliminaries 


We begin by hxing notation, definitions and preliminaries. Throughout this paper M"' denotes the 
Euclidean space with dimension n. The inner product in is defined by {x, y) := x'^y for all x, y G M"'. 
The open (resp. closed) ball in M” centered at x with radius p is denoted by B(x,p) (resp. ]B(x,p)). 
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The non-negative orthant of M”" is denoted by M” and is defined by M” := {(xi,..., Xn) G K”" | Xi > 0}. 
Denote by ]R[x] the ring of polynomials in x := {xi,X 2 , ■ • •, Xn) with real coefficients. For a polynomial 
/ with real coefficients, we use deg / to denote the degree of /. For a differentiable function / on M"", 
V/ denotes its derivative. For a differentiable function g : x —)• M, we use VxQ (resp. Vyg) 

to denote the derivative of g with respect to the first variable (resp. second variable). We also use N 
(resp. N>o) to denote all the nonnegative (resp. positive) integers. Moreover, for any integer t, let 
Nf := {a G N” : ^ ^ ™ ^1(^4) and int(74) to denote the closure and 

interior of A. For a given point x, the distance from the point x to a set A is denoted by d{x, A) and 
is defined by d{x,A) = inf{||x — a|| : a G A}. 

We say that a real polynomial / G M[x] is sum-of-squares (SOS) if there exist real polynomials 
fj,j = 1,..., r, such that / = //• The set of all sum-of-squares real polynomials in x is denoted 

by S^[x]. Moreover, the set of all sum-of-squares real polynomials in x with degree at most d is denoted 
by T^[x]. We also recall some notions and results of semi-algebraic functions/sets, which can be found 

in [gES]. 

Definition 2.1 (Semi-algebraic sets and functions) A subset ofMA is called semi-algebraic if it 
is a finite union of sets of the form {x G M” : /i(x) = 0,i = 1,... ,k‘, fi{x) > 0,i = k + 1,... ,p}, where 
all fi are real polynomials. If A C M” and B gMP are semi-algebraic sets then the map f: A ^ B 
is said to be semi-algebraic if its graph {{x,y) G A x B : y = /(x)} is a semi-algebraic subset in 
X W. 


Semi-algebraic sets and functions are important classes of sets and functions and they have impor¬ 
tant applications in nonsmooth optimization (for a recent development, see [7]). In particular, they 
enjoy a number of remarkable properties. Some of these properties, which are used later in the paper, 
have been summarized in the Appendix A for the convenience of the reader. 

We now present a preliminary result on Holder continuity of the solution mapping of the lower level 
problem. As a consequence, we provide an existence result of the solution of a bilevel polynomial 
optimization problem (P). 

Let F: ^ M™' be a set-valued mapping and let y G F{x). Recall that F is said to be Holder 

continuous (calm) at (x, y) with exponent r G (0,1] if there exist d, e, c > 0 such that 

d {y, F{x)) < c ||x — xlE for all y G F{x) H e) and x G BRn(x, <5). 

In the case when r = I, this property is often refereed as calmness and has been well-studied in 
nonsmooth analysis (see for example [8j). We first see that even in the case, where G is a continuously 
differentiable function and the set {y G : hj{y) < 0} is compact, the solution map Y: M"" ^ M"* 
of the lower level problem Y{x) := argminj^gRm{G(x,y) : hj{y) < 0,/ = l,...,r} is not necessarily 
Holder continuous for any exponent r > 0. 

Example 2.2 (Failure of Holder continuity for solution map of the lower level problem: 
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non-polynomial case) Let / : M —>• M be defined by 


fiy) 


e ^ if y ^ 0 

0 if y = 0. 


Consider the solution 
{x - f{y))^- Then, it 
function) and 


mapping Y{x) = argminj^{G(x, y) : < 1} for all x G [—1,1], where G{x,y) = 

can be verified that G is a continuously differentiable function (indeed it is a 


Y{x) 


if a:G(0,l], 

{0} if X e [—1,0]. 


We now see that the solution mapping Y is not Holder continuous at 0 with exponent r for any 
r G (0,1]. To see this, let x^ = e~^ —)• 0 and yk = \ ^ Y{xk)- Then, for any r G (0,1], 


IxfcT _ e _ Vk ^ ^ 

d{yk,Y{0)) R ’ 

y k 

So, the solution mapping is not Holder continuous at 0 for any r G (0,1]. 


The Holder continuity of the solution set of general parametric optimization problems has been 
established under suitable regularity conditions, for example see [la E]. This property plays an 
important role in establishing the existence of solutions for bilevel programming problems and equi¬ 
librium problems (see for example |33j and Corollary |2.5[ ). Next, we show that, the solution map of 
a lower level problem of a bilevel polynomial optimization problem is always Holder continuous with 
an explicit exponent which depends only on the degree of the polynomial involved and the dimension 
of the underlying space. This result is based on our recent established Lojasiewicz inequality for 
nonconvex polynomial systems in [30j . 


For m, d G N, denote 


i?(m, d) 


1 if d = 1, 

d(3d - 3)”*-^ if d > 2. 


( 2 . 1 ) 


Theorem 2.3 (Holder continuity of solution maps in the lower level problem: poly¬ 
nomial case) Let hj, j = l,...,r and G he polynomials with real coefficients. Denote d := 
max{deg/ij, degG(x, •)}. Suppose that F := {y G : hj{y) < 0} is compact. Then, the solution 
map Y : ^ in the lower level problem Y{x) := argmin^^gj^ m {G{x,y) : hj{y) < 0, j = 1,... ,r} 

satisfies the following Holder continuity property at each point x G M""." for any d > 0, there is a 
constant c > 0 such that 


T(x) C y(x)-|-c ||x — x||’']B]Rm(0, 1) whenever ||x — x||<d, (2.2) 

for some r G [tq, 1] with tq = niax{^ R(m+r 2 d) i particular, Y is Holder continuous at 
X with exponent tq for any x G M”. 
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Proof. For any fixed x G M”, define <f>(x) = minj^g® m {G{x,y) : hj{y) < 0, j = 1,..., r} and let 

r 

:= X] [^i(y)]+ + l^(*) - G{x,y)\. 

i=l 

Then, for all fixed x, 

{?/G M-| = 0} = Y{x) 

= {?/ G M™"! hj{y) <0 as j = 1,..., s, and <f>(x) — G{x, y) = O}. 

Note that F is compact. Now, the Lojasiewicz inequality for nonconvex polynomial systems I3Q1 
Corollary 3.8] gives that there is a constant cq > 0 such that 

d{y, Y (x)) < Co ^xivT for all y £ F, (2.3) 

for some r G [tq, 1] with tq = 2 d) i' Further, there is a constant L > 0 such 

that 

|G(x,y) - G(x,y)| < Ljjx-xjj (2.4) 

_ 1 

for all y G T and for all x with jjx — xjj < 6. Denote c := {2I3~^LY with /? := Cq ^ >0. For any 
y G F(x) we select now y G y(x) satisfying jjy — yjj = d{y,Y(x)). To finish the proof, it suffices to 
show that 

||y-y|| < c||x-x|r. (2.5) 

To see this, note that |<f>(x) — G(x,y)| = <hi(y) > Pd[y,Y{x))^ = (3\\y — y||^. Since y G Y{x), it 
follows that G{x,y) = <I>(x) < G(x,y), and hence 

||y-y||- </3“^|4>(x) - G(x,y)| =/3“^(G(x,y) - G(x,y)). (2.6) 

Furthermore, as y £ T(x), G{x,y) < G{x,y), and therefore ( |2.4[ ) gives us that 

G{x,y) - G{x,y) = (G(x,y) - G(x,y)) + (G(x,y) - G(x,y)) + (G(x,y) - G(x,y)) 

< (G(x, y) - G{x, y)) + (G(x, y) - G(x, y)) 

< 2L||x —xjj as y,y £ F. 

This together with ( |2.6| ) yields 

jjy -y||- < l3~^{G{x,y) - G{x,y)) < 2/3“^L||x - xjj. 


Thus 


d{y,Y{x)) = ||y-y|| < c||x-x| 


which verifies (2.5) and completes the proof of the theorem. 


In general, our lower estimate of the exponent r will not be tight. We present a simple example to 
illustrate this. 
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Example 2.4 Consider the solution mapping Y(x) = argmin^g^Kx—< 1} for all x G [—1, 1]. 
Clearly, 


{±y/x} if xG[ 0, 1], 
{0} if xG[-l,0). 


So, the solution mapping is Holder continuous at 0 with exponent 1/2. On the other hand, our lower 
estimate gives tq = 1/84. So, the lower estimate is not tight. 


Corollary 2.5 (Existence of global minimizer) For the bilevel polynomial optimization problem 
(P), let K = {{x,y) G M” X M'" : gi{x,y) < 0} and F = {w e : hj{w) < 0}. Suppose that 
Ki = {x G M” : (x,y) G K for some y G and F are compact sets. Then, a global minimizer for 
(P) exists. 


Proof. Denote the optimal value of problem (P) by val(P). Let {xk,yk) be a minimizing sequence 
for the bilevel polynomial optimization problem (P) in the sense that gi{xk,yk) < 0, i = l,...,s, 
hj{yk) < 0, j = 1,... ,r, yfc G Y{xk) and f{xk,yk) val(P). Clearly, {xk,yk) G K (and so, Xk G Ki) 
and yk G F. By passing to a subsequence, we may assume that {xk,yk) {xiV) G Ki x F. By 
continuity, we have f{x,y) = val(P). To see the conclusion, it suffices to show that y G Y{x). Denote 
^k = \\xk — x\\ ^ t). Then, by Theorem |2.3[ there is c > 0 such that 

T(xfc) C y(x) + ce^B]Rm(0,1) for all /c G N. 

As yk G Y{xk), there exists y/ G y(x) such that 

||yfc-y^|| <2ce^^0. (2.7) 


Note that Y (x) F F,Y (x) is a closed set and F is compact. It follows that Y (x) is also a compact set. 
Passing to the limit in (2.7), we see that y G Y{x). So, a global minimizer for problem (P) exists. 


The following lemma of Putinar ([39]), which provides a characterization for positivity of a polyno¬ 
mial over a system of polynomial inequalities, can also be regarded as a polynomial analog of Parkas’ 
lemma m. This lemma has been extensively used in polynomial optimization [^ and plays a key 
role in the convergence analysis of our proposed method later on. 

Lemma 2.6 (Putinar’s Positivstellensatz) f3Wf Let fo and fi, i = 1, ... ,p be real polynomials in 
w on M’’. Suppose that there exist 7? > 0 and sums-of-squares polynomials ai,... ,ap G such 

that R — ||rc|p = ao{w) + G M'“. If fQ{w) > 0 over the set {re G M’’ : fi{w) > 

0,i = 1,.. . ,p}, then there exist ai G i = 0,1,... ,p such that /o = o'o + 

The following assumption plays a key role throughout the paper. 

Assumption 2.1: There exist 7?i,i?2 > 0 such that the quadratic polynomials (x,y) i—— 
||(x,y)|p and y i—)• 7?2 — ||y|P can be written as 

s r 

- \\ix,y)f = o-o{x,y) -'^cri{x,y)gi{x,y) and R 2 - \\yf = ao{y) - '^aj{y)hj{y), 

i=l j=l 
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for some sums-of-squares polynomials ao, cri,..., as G S^[x,?/] and snms-of-sqnares polynomials 
ao,ai,...,ar G 


We note that Assumption 2.1 implies that both K = {{x, y) G M"- x : gi{x, y) < 0,i = 1,..., s} 
and F = {y G M"* : hj{y) < 0,j = are compact sets [26]. Moreover, Assumption 2.1 

can be easily satisfied when K and F are nonempty compact sets, and one knows the bounds A^i 
for ||x|| on K and N 2 for ||y|| on F. Indeed, in this case, it suffices to add redundant constraints 
gs+i{x,y) = ||(x,y)|p —(A'f+ A"|) and hr+i{y) = ||y|p —A'fto the definition oi K and F respectively, 
and Assumption 2.1 is satisfied with R\ = + A^|, R 2 = A"|, Uj = 0 for all 1 < i < s, = 0 for all 

1 < J < "T and <Ts_|_i = ar+i = 1. We also note that, under Assumption 2.1, a solution for problem (P) 


exists by Corollary 2.5 


3 Convex Lower Level Problems 


In this section, we consider the convex polynomial bilevel programming problem (P) where the lower 
level problem is convex in the sense that, for each x G M"', G{x, ■) is a convex polynomial, hj are 
polynomials, j = 1,... ,r, and the feasible set of lower level problem F := {w G : hj{w) < 0, j = 
1,... ,r} is a convex set. We note that, the representing polynomials hj which describes the convex 
feasible set F need not to be convex, in general. 

We say that the lower level convex problem of (P) satisfies the nondegeneracy condition if for each 

i = 1, 

y G F and hj{y) = 0 =► Vhj{y) / 0. 

Recall that the lower level convex problem of (P) is said to satisfy the Slater condition whenever there 
exists yo G such that hj{yo) < 0, j = 1,... ,r. Note that, under the Slater condition, the lower 
level problem automatically satisfies the nondegeneracy condition if each hj, j = 1,..., r is a convex 
polynomial. 

Let us recall a lemma which provides a link between a KKT point and a minimizer for a convex 
optimization problem where the representing function of the convex feasible set is not necessarily 
convex. 

Lemma 3.1 Theorem 2.1]) Let 4> be a convex function on M”* and F := {w G M"* : hj{w) < 

0, j = 1,..., r} 5e a convex set. Suppose that both the nondegeneracy condition and the Slater condition 
hold. Then, a point y is a global minimizer 0 /min{(/)(u;) : w G F} if and only if y is a KKT point of 
min{(/)(u)) : w G F}, in the sense that, there exist Aj > 0, j = 1,..., r such that 

r 

+ '^^j^hj{y) = 0, Xjhjiy) = 0, hj(y) < 0, j = 1,..., r. 

We see in the following proposition that a polynomial bilevel programming problem with convex 
lower level problem can be equivalently rewritten as a single level polynomial optimization problem in 
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a higher dimension under the nondegeneracy condition and the Slater condition. In the special case 
where all the representing polynomials hj are convex, this lemma has been established in |12| . 

Proposition 3.2 (Equivalent single-level problem) Consider problem (P) where the lower level 
problem is convex. Suppose that the lower level problem satisfies both the nondegeneraey condition and 
the Slater condition. Then, (x, y) G M"" x M"* is a global solution of the bilevel polynomial optimization 
problem (P) if and only if there exist Lagrange multiplier^X = (Aq, • • ■, A,.) G such that (x, y, A) G 
X M”" X is a global solution of the following single level polynomial optimization problem: 

(P) min f{x,y) 

a;eR”,yeR™,AeK’’+i 

subject to gfix, y) < 0, i = 1,..., s, (3-8) 

r 

XoVyG{x,y) + '^XjVhj{y) = 0, 

r 

Ao > 0, ^ A^ = 1, Xjhjiy) = 0, Xj > 0, hj{y) < 0, j = 1,... ,r. 

j=0 


Proof. Fix any x G M”'. The conclusion will follow if we show that y G Y{x) is equivalent to the 
condition that there exist Xj > 0, j = 0,1,..., r such that 


AoVyG(x,y) + '^XjVhj{y) = 0, 
i=i 

^jhfiy) = 0, Xj > 0, hj{y) < 0, j = 1,..., r, 

r 

Ao>0,^A2 = 1. 
i=o 


(3.9) 


To see the equivalence, we first assume that y G T(x). Under both the nondegeneracy condition and 
the Slater condition, the preceding Lemma guarantees that there exist fij > 0, j = 1,... ,r, such that 


VyG{x,y) + ^/rjV/ij(y) = 0, y.jhj{y) = 0, jij > 0, hj{y) < 0, j = 1,..., r. (3.10) 

i=i 


So, (3.9) holds with Aq = 




im: 


= and Xn = 
2 ^ 








= , j = l,...,r. 


Conversely, let {x,y,X) satisfy (3.9). We now show that Aq 0. Indeed, assume on the contrary 
that Ao = 0. Then, ^j^hj{y) = 0, Xjhj{y) = 0, Xj > 0 and hj{y) < 0 j = 1,..., r. 

Let J = {j G {!,...,r} : Xj > 0} 0. From the Slater condition, there exists yo G 


that hj{yo) < 0, j = 1,..., r. Then, there exists p > 0 such that hj{w) < 0 for all w G 


w 


— 2/0II < P- As ^j'^hj{y) = 0, we obtain 


XjS/hj{y)'^(w — y) = 0 for all w with ||r(; — 2/o|| < P- 


such 

with 


(3.11) 


^Indeed, as shown in the proof, Ao 7^^ 0 always holds under our assumptions. See Remark 


3.3 


for a detailed discussion. 
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We now see that Vhj{y)'^[w — y) < 0 for all w with ||u; — yo|| ^ P and for all j G J. (Suppose on the 
contrary that there exists wq with \\wo — yo\\ < p and jo G J such that Vhjfj{y)^ {wq — y) > 0. By 
continuity, for all small t, hjg{y + t{wo — y)) > 0, and hence y + t{wo — y) ^ F. On the other hand, 
from our choice of p, we see that hj{wo) < 0 for all j = 1,..., r. So, wq G F. It then follows from the 


convexity of F that y + t{wo — y) € F for all small t. This is impossible.) This together with (3.11) 
and Xj = 0 for all j ^ J shows that 

Vhj{y)'^{w — y) = 0 for all w with \\w — yo|| < P and j G J, 

and so, Vhj{y) = 0 for all j G J. Note that y G F and hj{y) = 0 for all j G J. This contradicts the 
non-degeneracy condition, and so, Aq / 0. Thus, by dividing Aq on both sides of the first relation of 


(3.9), we see that (3.10) holds. This shows that y G Y{x) by the preceding lemma again. 


Remark 3.3 (Importance of nondegeneracy and Slater’s conditions) In Proposition 3.2 we 
require that the nondegeneracy condition and the Slater condition hold. These assumptions provide 
us a simple uniform bound for the multipliers Aq ,..., A^ in the lower level problem which plays an 


important role in our convergence analysis later in Theorem 3.5 Indeed, these assumptions ensure 
that Aq / 0, and so, in particular, the equivalence of the following two systems: 


XoX/yG{x,y) + ^XjVhj{y) = 0, 

Ao > 0, Xjhj{y) = 0, Xj > 0, hj{y) < 0, j = 1,...,r. 

r 

i=o 




VyG{x,y) + ^yjVhj{y) = 0, 
i=i 

= 0^ Pj > 0: ^jiy) < 0, j = 1,...,r. 


Note that the non-degeneracy condition is satisfied when the representing functions hj, j = 1,..., r, 
are convex polynomials and the Slater condition holds. Thus, in this special case, the Slater condition 
alone is enough for transforming the polynomial bilevel problem with a convex lower level problem to 
a single-level polynomial optimization problem. 

The following simple example illustrates that the preceding Proposition can be applied to the case 
where /ij’s need not be convex polynomials. 

Example 3.4 Consider the bilevel problem 

-X® + yl + yl 

x^ + yl + yl<2 

y GY{x) := argmin^gK 2 {x(uii -|- W 2 ) '■ 1 - W 1 W 2 < 0, 0 < rci < 1, 0 < rc 2 < !}■ 

Clearly, the lower level problem of (EPi) is convex but the polynomial {wi,W 2 ) i--)- 1 — wiW 2 is not 
convex. It can be verified that the non-degeneracy condition and Slater condition hold, and so, {EPi) 


(EPi) 


mm 

xSKji/eK^ 

subject to 
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is equivalent to the following single level polynomial optimization problem 


min 

a;eK,i/eK2,(Ao,...,A5)eK® 
subject to 


-x® + yl + yl 
+ vl + yl<2 


1 - 2/12/2 < 0, 0 < ?/i < 1,0 < y2 < 1 

Xqx + Ai(—2/2) — A2 + A3 = 0 

Aqx + Ai(—2/1) — A4 + As = 0 

Ai(l - 2/12/2) = 0, A22/1 = 0, A3(1 - 2/1) = 0 

A42/2 = 0 , A5(1 — 2/2) = 0 

5 

A,- > 0 ,j = 0 ,l,..., 5 ,^A| = l. 

j=0 


Proposition |3.2| enables us to construct a sequence of semidefinite programming problems for solving 
a polynomial bilevel programming problem with a convex lower level problem. To do this, we denote 


and 


Gp{x,y,X) 


9 p{x,y) 
* hp—s{y) 


p = l,...,s, 

p = s + I,... ,s + r, 

p = s + r + l,...,s + 2r + l, 


Hq{x,y,X) 


XoVyG{x,y) + '^XjVhj{y) 

E - 1. 

j=0 


q—r 


q = 

q = r + 1,... ,r + m 


q = r + m + 1, 


where yXoVyG{x, y) + X)j=i is the zth coordinate of AoVyG(x, y) + X]j=i ^j'^^j{y)i * = 

1,..., m. We also denote the degree of Gp to be Up and the degree of Hq to be Vq . 


We now introduce a sequence of sums-of-squares relaxation problems as follows: for each /c G N, 


(Qk) raaxp^ap 9 (3.12) 

s+2r+l r+m+1 

S.t. / — /i = fjQ — ^ ^ CTpGp — 'y ^ 4^qHq 

p=l q=l 

ap G S^[x, 2 /, A], p = 0,1,... ,s + 2r + 1, 

degcro < 2k, deg{apGp) < 2k,p = 1,..., s + 2r + I, 

(l)q G M[x, 2/, A], g = 1,..., r + m + 1, deg{(j)qHq) < 2k,q = 1,... ,r + m + 1. 


It is known that each (Qk) can be reformulated as a semidefinite programming problem [26]. 
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Theorem 3.5 (Convex lower level problem: Convergence theorem) Consider the problem 
(P) where the lower level problem is convex. Suppose that Assumption 2.1 holds and that the lower 
level problem satisfies both the nondegeneracy condition and the Slater condition. Then, val(Qfc) < 
val((5fc+i) for all k £ N and Ya\.{Qk) val(P) as k ^ oo, where val{Qk) and val(P) denote the 
optimal value of the problems (Qk) and (P) respectively. 


Proof. From Corollary 2.5 a global solution of (P) exists. Let (x, y) be a global solution of (P). From 
Proposition 3.2, there exists A G such that {x,y, A) is a solution of (P) and val(P) = val(P). 


From the construction of (Qk), k G N, it can be easily verified that val{Qk) < val((5fc+i) < val(P) 
for all k £ N. Let e > 0. Define f{x, y, A) = f{x, y) — (val(P) — e). Note that the feasible set U of (P) 
can be written as 

U = {{x,y,X) xR^ : -Gp{x,y, X) > 0, p = 1,..., s + 2r + 1, 

-Hq{x, y, A) > 0, Hg{x, y,X)>0, q = I,... ,r + m + 1}. 


Then, we see that / > 0 over U. We now verify that the conditions in Putinar’s Positivstellensatz 


(Lemma 2.6) are satisfied. To see this, from Assumption 2.1, there exist Ri,R 2 > 0 such that 
1^ = o-(i{x,y) - 


Ri - ||(x,; 


S 

E 

2=1 


y)gi{x, y) and P 2 - \\yf = o'o(y) 


r 

^j{y)hj{y), 




for some sums-of-squares polynomials uq, ui,..., G T,‘^[x,y] and sums-of-squares polynomials 
do, di,..., dr G Pf‘{y\. Letting A = (Aq, Ai, ..., Ar) G we obtain that 


(l + Pi + P2)-||(x,y,A)f 


r s r 

{ao{x,y) + ao{y)) - '^&j{y)hj{y) -'^ai{x,y)gi{x,y) + (1 - ^ A^) 
j=l i=l j=0 

r 

(cro(x,y) + afiy)) - '^aj{y)Gs+j{x,y,X) 

3=^ 

s 

-'^ai{x,y)Gi{x,y) - Hr+m+i{x,y, X). 

2 = 1 


So, applying Putinar’s Positivstellensatz (Lemma 2.6) with w = (x, y, A) G M"* x R^ x , there exist 
sums of squares polynomials Up G S^[x, y. A], p = 0, 1,..., s + 2r + 1 and sums-of-squares polynomials 
4>iq, 4>2q G S^[x, y. A], y = 1,... , r + m -|- 1 such that 


s+2r+l ‘T+m+l r+m+l 

f — ^0 ~ ^ ^ ~ ^ ^ 4^1qHq + ^ ^ (j)2qHq. 

p=l 9=1 9=1 


Let (l)q £ M[x,y. A] be a real polynomial defined by (f)q = cfiq — 02g, q = I,... ,r + m + 1. Then, we 
have 

sH-2rH-l r+m+l 

/ - (val(P) - e) = CTO - ^ (TpCp - ^ (fqHq. 

p=l q=l 

Thus, there exists A; G N, val{Qk) > val(P) — e = val(P) — e. Note that, by the construction, 
val{Qk) < val(P) = val(P) for all fc G N. Therefore, val{Qk) —)• val(P) = val(P). 
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Remark 3.6 (Convergence to a global minimizer) It is worth noting that, in addition to the 
assumptions of Theorem 3.5, if we further assume that the equivalent problem (P) has a unique solution 
say (x, y), then we can also find the global minimizer (x, y) with the help of the above sequential SDP 
relaxation problems. In fact, as each (Qk) is a semidefinite programming problem, its corresponding 
dual problem (see [26]) can be formulated as 

L.{f) 

Mfe(z) ^0, zo = 1, 

Mk-u^iGp,z) P0,p = l,...,s + 2r + l 
z) = 0, g = 1,..., r + m + 1, 

where Up (resp. Vg) is the largest integer which is smaller than ^ (resp. ^), Lz is the Riesz functional 
defined by L^{f) = YuafaZa with /(x) = ^ polynomial /, Mt{f,z), t £ N is the 

so-called localization matrix defined by [Mj(/, z)]^ ^ = Y^y f'yZa+y+'y for all a,l3 £ From 

the weak duality, one has val(P) > val((5^) > val(Qfc). Thus, the preceding theorem together with 
val(P) = val(P) implies that val((5^) —?■ val(P). Moreover, it was shown in [251 Theorem 4.2] that 
if the feasible set of the polynomial optimization problem (P) has a non-empty interior, then there 
exists a natural number A^o such that val((5|,) = Yal{Qk) for all k > Nq. 

Let Zfc be a solution of (Qk)- Then, as k ^ oo, we have (Lzj,(Xi),..., Lzj,(X„)) —)■ x, and 
{Lz,,{Xn+i), ■ ■ ■, Lz^{Xn+m)) ”^ V, where Xi denotes the polynomial which maps each vector to its 
ith coordinate, i = 1,..., n -|- m. The conclusion follows from m- 

The above theorem shows that one can use a sequence of semidefinite programming problems to 
approximate the global optimal value of a bilevel polynomial optimization problem with convex lower 
level problem. Moreover, under a sufficient rank condition (see [261 Theorem 5.5]), one can check 
whether finite convergence has occurred, i.e., by testing whether val(QfcQ) = val(P) for some ko £ N. 
This rank condition has been implemented in the software GloptiPoly 3 [18| along with a linear algebra 
procedure to extract global minimizers of a polynomial optimization problem. 

We now provide a simple example to illustrate how to use sequential SDP relaxations to solve the 
bilevel polynomial optimization problems with convex lower level problem: 

Example 3.7 (Solution by sequential SDP relaxations) Consider the following simple bilevel 
polynomial optimization problem 

mm xy — y 

x^ -b < 2 

y £ Y{x) := argmin^g]g{xrc : —1 < rc < 1}. 

Direct verification shows that there are two global solutions (—1,1) and (1,-1) with global optimal 
value 2. We note that the lower level problem is convex and it is equivalent to the following single 


{QD 


inf 


zeN! 


•n+m+r 


subject to 
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level polynomial optimization problem 


mm xy — y 

(x,y,\o,Xi,\2)eMP 

+ y^ <2 

Aqx + Ai — A 2 = 0 

Ai > 0, Ai(y - 1) = 0, A2(-1 - y) = 0, -1 < y < 1 

Ao + A? + Ai = l. 

Solving the converted single level polynomial optimization problem using GloptiPoly 3, the solver ex¬ 
tracted two global solutions (x, y, Aq, Ai, A 2 ) = (—1.000,1.000,0.7071, 0.7071,0) and (x, y, Aq, Ai, A 2 ) = 
(1.000, —1.000,0.7071,0,0.7071) with the true global optimal value —2. 


Remark 3.8 (Single level polynomial problem) In the case where (P) is a single level problem. 
Theorem 3.5 yields the known convergence result of the sequential SDP relaxation scheme (often 
referred to as the Lasserre hierarchy) for solving single level polynomial optimization problems [26]. 
Indeed, consider a (single level) polynomial optimization problem 

(Po) mhi{/(x) : gi{x) < 0, i = 1,..., s}. 

Suppose that there exist P > 0 and sums of squares polynomial ui G T? [x] such that 

R - ||x|p = cTo(x) - ^ ai{x)gi{x). 

i=\ 

Let /(x, y) = /(x), yi(x, y) = yj(x), z = 1,..., s and G(x, y) = 0 for all (x, y) G x M. We note that 
val(Po) equals the optimal value of the following bilevel polynomial optimization problem 


mm 

xeiR",j/eiR' 


subject to gi{x,y) < 0, i = 1,..., s, 

y G F(x) := argmin^g]R-{0 ■ 


3.5 


Then, Theorem 
by 

(Ql) n 


yields that val(Po) = lim val((5^), where, for each k, the problems (Q^) is given 

k —^00 


s.t. 


/ y — <^0 ^ ^ ^p9pj 

p=i 

Up G S^[x], y = 0,1,... ,s, degcJo < 2k, deg{apgp) < 2k,p = 1,... ,s. 


4 Nonconvex Lower Level Problems 


In this section, we examine how to solve a bilevel polynomial optimization problem with a nonconvex 
lower level problem towards a global minimizer using semi-definite programming hierarchies. 
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Consider an e-approximation of the general bilevel polynomial problem (P): 

f{x,y) 

gi{x,y) <0, i = 1,... ,s, 
hj{y) <0, j = 

G(x, y) — min |G(x, w) : hJw) < 0, j = 1,..., r| < e. 

The above e-approximation problem plays a key role in the so-called value function approach for hnding 
a stationary point of a bilevel programming problems, and has been studied and used widely in the liter¬ 
ature (for example see |31l H5] ). The main idea of the value function approach is to further approximate 
the (possibly nonsmooth and nonconvex) function x i—)• min^g]Rm{G(x,tc) : hj{w) < 0,j = 1,... ,r} 
using smooth functions, and asymptotically solve the problem by using smooth local optimization 
techniques (such as projected gradient method (PG) and sequential quadratic programming prob¬ 
lem (SQP) techniques). For instance, [31] use this approach together with the smoothing projected 
gradient method to solve the bilevel optimization problem, in the case where gi depends on x only, 
{x G M"" : gi{x) < 0} and {w G M™' : hj{w) < 0} are convex sets. The algorithm only converges to a 
stationary point of the original problem (in a suitable sense). 

We now introduce a general purpose scheme which enables us to solve {P^) towards global solutions 
using SDP hierarchies. The proof techniques for the convergence of this scheme (Theorem 4-6) relies 
on the joint-marginal method introduced in l2l^ to approximate a global solution of a parameterized 
single level polynomial optimization problem. Here, following the approaeh in we extend the 

scheme and its convergence analysis to the bilevel polynomial optimization setting. 

The following known simple lemma shows that the problem [Pfj indeed approximates the original 
bilevel polynomial optimization problem as e —?• 0+. To do this, for e, J > 0, recall that (x, y) is called 
a (j-global solution of [Pf) if {x,y) is feasible for {Pfj and f{x,y) < val(Pe) -t- 6 where val(Pe) is the 
optimal value of (Pe). 

Lemma 4.1 (Approximation lemma cf. I32f ) Suppose that K := {{x,y) G M”" x : gi{x,y) < 
0} and F = {w G M”* : hj{w) < 0} are compact. Let Ck —)• 0+ and <5^ —)• 0+ as k ^ oo. Let {xk,yk) be 
an 5k-global solution for (F’ej,). Then, {(xfc, yfc)}fceN u bounded sequence and any of its cluster point 
{x,y) is a solution of the bilevel polynomial optimization problem {P). 

The following lemma explains the analytic property of the function e i—val(Pe), and shows that 

1 

val(Pe) converges to val(P) in the order of 0{ei) as e —)• 0+ for some q G N>o := N\{0}. The proof 
relies on some important properties and facts on semialgebraic functions/sets and we delay the proof 
to the Appendix B. 

Lemma 4.2 (Analytic property & approximation quality) Suppose that Assumption 2.1 holds. 
Let I C := [0, -|-oo) be a finite interval. For each e G I, denote the optimal value of (P^) by val(Pe). 
Then, 


(Pg) min 

(3;,y)eK"xR"* 

subject to 
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(i) the one-dimensional function e i—)• val(Pe) is a nonincreasing, lower semicontinuous, right- 
continuous and semialgebraic function onl. In particular, the function e i—)■ val(Pe) is continuous 
over I except at finitely many points. 

(a) There exist q G N>o, eo > 0 and M > 0 such that for all e G [0, eo) 

val(Pe) < val(P) < val(Pe) + Me^. 


Now, we present a simple example to illustrate the above lemma. It also implies that, in general, 
the function e i—)• val(P£) can be a discontinuous semialgebraic function. 


Example 4.3 Consider the bilevel polynomial optimization problem 


{EP) min(^^y)gR 2 y 

s.t. < 1, 

y G argmin^gslx^ + - 1)^ < 0}. 


Note that J(x) = min{x^ + w'^ : — 1)^ < 0} = x^. Its e-approximation problem is 

w£R 


(EPP min y 

{x,y)£R'^ 

S.t. X^ < 1, 

y\y^ - 1)" < 0, < g. 


It can be verified that 


va\{EP,) 


0, if0<e<l, 

— 1, if e > 1. 


Therefore the function e i—val(PPe) is nonincreasing, lower semicontinuous, right-continuous and 
semialgebraic on [0,-|-oo). Moreover, it is continuous on [0,eo] for any eo < 1 and it is discontinuous 
at 1. 


Solving e-approximation problems via sequential SDP relaxations 

Here, we describe how to solve an e-approximation problem via a sequence of SDP relaxation problems. 
One of the key steps is to construct a sequence of polynomials to approximate the optimal value 
function of the lower level problem x i—)• min^„g]Rm{G(x, w) : hj{w) < 0, j = 1,..., r}. In general, the 
optimal value function of the lower level problem is merely a continuous function. We now recall a 
procedure introduced in m to approximate this optimal value function by a sequence of polynomials. 

Recall that K = {x : gi{x,y) < 0,i = l,...,s}. We denote PiiK = {x G M” : (x,y) G 
K for some y G From Assumption 2.1, K is bounded, and so, PriiF is also bounded. Let 

PriAT C D := {x G M"" : ||x||oo < M} for some M > 0. Let 0i(x) = xf — M^, I = 1,... ,n. Then 
Q = {x : 9i{x) <0,1 = 1,..., n}. Let iphe a probability Borel measure supported on D with uniform 
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distribution on X. We note that all the moments of (p over Q denoted by 7 = ( 7 ^), (3 G N"", defined 
by 

7^3 := [ xf^dpix), P G 
Jn 

can be easily computed (see 127!). 


For each A: G N with k > ko := maxH"^^], set 

and consider the following optimization problem 


{{ai,...,an) G N*" : < 2k} 

1=1 


^aXx,ao,...,ar+n Y1 VT/J 

r n 

s.t. G{x,y)- ^ Xpx^ = aQ{x,y)-'^aj{x,y)hj{y)-'^ar+i{x,y)0i{x) 

/ 3 eN ", j=i 1=1 

cjj G S[x,y], j = 0,1,... ,r+ n (4.1) 

degcTo < 2k, d,eg{ojhj) <2k,j = l,...,r, (leg{ar+i0i) < 2k,l = 1,... ,n, 


which can be reformulated as a semidefinite programming problem m- Then, for any feasible solution 
(A, (To, (Ti,..., (Tr+n), the polynomial x 1 —)■ Jk{x) := ^yx^ is of degree 2k and it satisfies, for all 

X ^ kl = {x ■. 9i{x) < 0, Z = 1,..., n} and y G F := {to : hj{w) < 0, j = 1,..., r}, 

r n 

G{x,y)- = (^o{x,y) - '^(Tj{x,y)hj{y) - '^ar+i{x,y)ei{x) >0. 

/3eN"j^ j=i 1=1 

So, for every A; G N, Jk{x) < J(x) := minu,g]Rm{G(x, to) : hj{w) < 0} for all x £ Q. Indeed, the next 
theorem shows that Jk converges to the optimal value function J on Q, in the Li-norm sense. 


Lemma 4.4 ( f^ ) Suppose that Assumption 2.1 holds. For each A; G N, let pk be the optimal value 
of the semidefinite programming Let Cfe —>• 0 and let (A, (Tq, (Ti, ..., Ur+n) be an Ck-solution of 

(4-1) in the sense that ^ Pk — Cfc- Define Jk G M 2 fc[x] by Jk{x) = ^px^. Then, 

we have Jk{x) < J{x) for all x £ Q and 


\Jk{x) — J{x)\dip{x) —)• 0 as A: —>■ 00 . 


m 


We now introduce a scheme to solve the e-approximation problem for arbitrary e > 0, using sequences 
of semidefinite programming relaxations. 

Algorithm 4.5 (general scheme) 

Step 0: Fix e > 0. Set A: = 1. 


Step 1: Solve the semidefinite programming problem (4.1) and obtain the ^-solution of 

(4.1). Define Jk{x) = X^pX^. 
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Step 2: Consider the following semialgebraic set 

Sk ■= {{x,y) : giix,y) <0, z = 1,.. .,s,hj{y) <0, j = 1,.. .,r,G{x,y) - Jk{x) < e}. 
If Sk = 0, then lei k = k + 1 and return to Step 1. Otherwise, go to Step 3. 

Step 3: Solve the following polynomial optimization problem 

f{x,y) 

9 i{x,y) <0, i = 1,... ,s, 
hj{y) <0, j = 

G{x,y) - Jk{x) < e. 

Step 4: Let = mini<j<fc val(P*). Update k = k + 1. Go back to Step 1. 


{Pe 


mm 

(x,y)eR"x]R™ 
subject to 


Before we establish the convergence of this procedure, let us comment that the feasibility problem of 
the semialgebraic set in Step 2 can be tested by a sequence of SDP relaxations via the Positivstellensatz. 
This was explained in [SH] and was implemented in the matlab toolbox SOSTOOLS. As explained 
before. Step 3 can also be accomplished by solving a sequence of SDP relaxations. 

Let us show that there exists a finite number /cq such that 0, and so. Algorithm 4.5 is 

well-defined. 

Lemma 4.5 Let e > 0. Consider the problem (P^) and Algorithm 4-5. Let K = {(x,y) : gi{x,y) < 
0,i = l,...,s} and F = {w : hj{w) < 0,j = l,...,r}. Suppose that Assumption 2.1 holds and 
cl(int(iL n (M"' X F))) = K (M” x F). Then, there exists a finite number ko such that Sko % in 
Step 2 of Algorithm 4-5. 


Proof. Note from Corollary 2.5 that a global minimizer (x, y) of (P) exists. In particular, the set 
Do := {{x,y) G K (1 (M”" x F) : G{x,y) — J{x) < e} is an nonempty set as (x,y) G Dq. Noting from 
our assumption, we have cl(int(A' n x F))) = K D (M” x F). This together with the fact that 
{(x, y) : G{x, y) — J(x) < e} is an open set (as the optimal value function of the lower level problem 
J(x) is continuous) gives us that 


D := {{x,y) G int(F n (M" x F)) : G{x,y) — J(x) < e} 


is a nonempty open set. Define D := PriD = {x G M"" : (x, y) G D for some y G M"*}. Then, D is 
also a nonempty open set. Note from Lemma 4.4 that converges to J in L^(D, (/j)-norm. Hence 
converges to J almost everywhere on D. As < +oo, the classical Egorov’s theorerrj^ implies that 


^The Egorov’s theorem [HI Theorem 2.2] states that: for a measure space let /*, be a sequence of functions 

on ft. Suppose that P is of finite :y3-measure and {/*,} converges (/p-almost everywhere on P to a limit function /. Then, 
there exists a subsequence Ik such that fi^, converges to / almost uniformly in the sense that, for every e > 0, there exists 
a measurable subset A of P such that < e, and {fiA converges to / uniformly on the relative complement P\A. 
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there exists a subsequence 4 such that converges to J y?-almost uniformly on fl. So, there exists 
a Borel set A with ‘^{A) < ^ with rj := Vj(-D) > 0 such that 

Jl^ — 7 - J uniformly over 

We observe that (n\^) n 7 ^ 0 (Otherwise, as D C PiiK C n, we have D O A. This implies 
that r] = ip{D) < ip{A) = r//2 which is impossible as r] > 0). Let xq G O D. Then, we have 

Jikixo) Ji^o) and there exists yo G such that yo G F, G{xo,yo) — J{xo) < e. In particular, for 
all k large, (xo,yo) £ Si^,. Therefore, Si^ 7 ^ 0 for all large k, and so, the conclusion follows. 

Remark 4.6 The fact that Li-convergence implies the almost-uniform convergence can also be seen 
by using Theorem 2.5.1 (Li-convergence implies convergence in measure) and Theorem 2.5.3 (conver¬ 
gence in measure implies almost-uniform convergence for a subsequence) of [21 Page 92-93] without 
requiring the measure of to be finite. 

We note that the condition “cl(int(iLH (M” x F))) = KCl (M” x F)” holds when C := KCl (M"’ x F) 
is a finite union of closed convex sets Ci with intCj 7 ^ 0. Moreover, if the set C is of the form 
{(x, y) G X MF : Gi{x, y) < 0,i = 1,... ,1} for some polynomials Gi, i = 1,... ,l and I G N, then 
the above condition also holds if the commonly used Mangasarian-Fromovitz constraint qualification 
[23] is satisfied for any (x, y) G G. 

We are now ready to state the convergence theorem of the proposed Algorithm 4.5. The proof of it 
is quite technical and so it is given later in Appendix C. 

Theorem 4.7 (General bilevel problem (P): Convergenee theorem) Let e > 0 and consider 
problem (Pe). Let be generated by Algorithm 4-5. Let K = {{x,y) : gi{x,y) < 0, f = 1, ..., s} and 
F = {w : hj{w) < 0, j = 1,... ,r}. Suppose that Assumption 2.1 holds and cl(int(iF H (MF x F))) = 
K n (M"' X F). Then, 

(i) ^ Ve as k ^ 00 where val(F£) < < lim val(F 5 ). In particular, for almost every e, 

s^e- 

—)> val(F£) in the sense that, for all finite intervals I C ]R_|_, = val(Fe) for all e ^ I except 

at finitely many points. 

(a) There exists eo > 0 such that, for all e G (0,eo), —>■ val(Fe) as k ^ 00 . Moreover, let 6k i 0. 

Let = mini<j<fc val(Fg) = val{P)^) and let {xk,yk) be a 6k-solution of {P(^). Then, {{xk,yk)} 
is a bounded sequence and any cluster point (x,y) of {xk,yk) is a global minimizer of (Fg) for 
all e G (0, eo). 

We now illustrate how our general scheme can lead to solving a bilevel programming problem with 
a nonconvex lower level problem towards a global solution. This is done by applying our scheme to a 
known test problem of the bilevel programming literature. 
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Example 4.8 (Illustration of our approximation scheme) Consider the following bilevel opti¬ 
mization test problem (for example see |31[ [37] ) 


x + y 

1 • r -.1 • fXw'^ , 

subject to X e [-1, l\,y G argmm^g[_i_i]{^- 

2 3 

Let Y{x) := argmin^g[_]^ -^}. Clearly, the lower level problem is nonconvex and all the 

conditions in Theorem |4 . 7| are satisfied. The optimal value function of the lower level problem is given 
by 


. xw^ \ 0 , 

J{x)= mm {—-^} = 

ioe[—1,1] 2 3 


1 

3 ’ 


if xG[|, 1 ], 
if 


2 3’ XG[ 1,3), 

and the solution set of the lower level problem Y (x) can be formulated as 


Y{x) 


{ 0 }, if xG(|, 1 ], 

< { 0 , 1 }, if x = |, 

{ 1 }, if xG[- 1 ,|). 


It is easy to check that the true (unique) global minimizer is (x,y) = (—1,1)^ and the true global 
optimal value is 0 . 


Now, for k = 3, using GloptiPoly 3, we obtain a degree 2k{= 6 ) polynomial approximation of J{x) 
which is 

J^{x) = -0.3338 + 0.5011 *x + 0.0098 * - 0.0032 * x^ - 0.0696 * x^ - 0.1012 * x^ - 0.0432 * x®. 

The following figure depicts the graph of the functions J 3 and J, where the red curve is the graph of 
the function J and the blue curve is the graph of the degree 6 polynomial J 3 . From the graph, we 
can see that J 3 < J over the interval [—1,1] and provides a reasonably good approximation of the 
piecewise differentiable (and so, non-polynomial) function J(x). 


Figure 1: J(x) and its degree-6 underestimation in Example 4.8 
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Setting e = 0.001 and solving the following polynomial optimization problem 


subject to 


x + y 

X G [-1,1], 

y e [-1,1], 

xy 

~Y~ 


Jsix) < 0 . 001 , 


with GloptiPoly 3, the solver returns the point (x,y) = (—1.0000,0.9996) with its associated function 
value —4.1680e —04, which is a reasonably good approximation of the true global minimizer and global 
optimal value of the bilevel programming problem. 


Remark 4.9 (Further extensions of the approach) Although we presented our approach for 
a class of bilevel problems where the constraints of the lower-level problem are independent of the 
upper-level decision variable x, our approach may be extended to solve the following more general 
bilevel polynomial optimization problem: 

f{x,y) 

gi{x,y) < 0 , i = 1 ,... ,s, 

y eY{x) := argmin^g]Rm{G(x,rc) : hj{x,w) < 0,j = 

where the constraints of the lower level problem are allowed to depend on x. In this case, we can 
construct a sequence of semidefinite programming relaxation for finding a global minimizer and a 
global minimum of its e-approximation problem and similar convergence results of the scheme can 
be achieved under an additional technical assumption that the optimal value function of the lower 
level problem J(x) := min^„g]Rm{G(x,tc) : hj{x,w) < 0,j = l,...,r} is continuous. However, we 
wish to note that, for the problem (P) discussed in this paper (that is, hj are independent of x), this 
condition is automatically satisfied. On the other hand, in general, this condition may fail for the 
general problem (GP) even when n = m = 1. We provide a simple example to illustrate this. Consider 
the following bilevel programming problem 

min 

subject to 0 < X < 1, 

y G y(x) := argmin^g]g{(x — w)'^ : x^ — < 0 , w{w — 1 ) < 0 , —w{w — 1 ) < 0 }. 

It can be directly verified that the optimal value function of the lower level problem is given by 

J(x) := min{(x — w)'^ : x^ — < 0, w(w — 1) = 0} = 

toeiR 

and is discontinuous at x = 0 . 


0, if X = 0, 
(x — 1)^, if X G (0,1]. 


(GP) 


mm 


subject to 
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5 Numerical Examples 


In this Section, we apply our schemes to solve some bilevel optimization test problems available in the 
literature and present their results. We conducted the numerical tests on a computer with a 2.8 GHz 
Intel Core i7 and 8 GB RAM, equipped with Matlab 7.14 (R2012a). We solved bilevel polynomial 
problems with convex as well as non-convex lower-level problems, where the lower level problems are 
independent of the upper level decision variables. 

We first present results for the following bilevel problems with a convex lower level problem. We 
note that all the assumptions of Theorem |3.5| are satisfied by these bilevel problems with a convex 
lower level problem. 

Example 5.1 Consider the following bilevel polynomial problem |16j 

mina;,j^gi; (x - 3)2 + {y - ‘if 
s.t. — 2 x + y — I < 0 

X-2y+ 2 <0 

< X -b 2y — 14 < 0 

0 < X < 8 

0 < y < 10 

y E axgmm^^^{{w - : w e [0,10]}. 

This problem has a unique global minimizer (x*,y*) = (3, 5) and the optimal value f* = 9. 

Example 5.2 Consider the following bilevel polynomial problem |16) 

mina;,yg]R -(4x - 3)y-b (2x-t-1) 
s.t. 0 < X < 1 

0 < y < 1 

y E argmin^g]R{-(l - 4x)u; - (2x -b 2) : u; E [0,1]}. 

This problem has a unique global minimizer (x*,y*) = (0.25,0) and the optimal value f* = 1.5. 

We hrst transformed the problems in Example |5.1| and Example |5.2| into equivalent single-level non- 
convex polynomial optimization problems as proposed in Section 3. Then, we used GloptiPoly 3 |18j 
and the SDP solver Sedumi [32] to solve the transformed polynomial optimization problems. For 
these two problems, the second relaxation problem (that is, problem (Q 2 )) of the SDP approximation 
scheme (3.12) returns a solution which agrees with the true solution. 

The following table summarizes the results of bilevel problems with a convex lower level problem 
where (x*,y*) and f* denote the true global minimizer and the true optimal value respectively, (x,y) 
and / denote the computed minimizer and the computed optimal value respectively and CPU time 
represents the CPU time (in seconds) used to solve the problems. 

Table 1: Convex Lower-Level Problems 
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Test Problems 

Known optimal solutions 

Computed solutions 

Example 

5.1 

= (3,5) 

r = 9 

(x,y) = (3.0000,5.0000) 

/ = 9.0000 

CPU time=0.2511 

Example 

5.2 

(x*,y*) = (0.25,0) 

r = 1-5 

(x,y) = (0.2500,0.0000) 

/ = 1.5000 

CPU time=0.1957 


We now solve the following bilevel problems with a non-convex lower level problem. Again, all 
the assumptions in Theorem |4.7| are satisfied by these bilevel problems with a nonconvex lower level 
problem. 

Example 5.3 Consider the following bilevel polynomial problem |36j 

min,j,_ygR X 

s.t. —X -h y < 0 

< -10 < X < 10 

-1 <y < 1 

y E argmin^gR{u;^ : u; E [-1,1]}. 

This problem has a unique global minimizer {x*,y*) = (-1,-1) with the optimal value f* = —1. 
Example 5.4 Consider the following bilevel polynomial problem |36j 

min3;,ygK 2x + y 
s.t. —1 < X < 1 

y E argmin^gR{--xw;2 - -w'^ : w E [-1,1]}. 

V Z 4 

This problem has two global minimizers {xl,yl) = (—1,0) and (x^,?/^) ~ ~1) with the optimal 

value f* = —2. 

Example 5.5 Consider the following bilevel polynomial problem |36) 

mina;,yg]R y 

s.t. 0.1 < X < 1 

^ -i<y < 1 

3 1 

y E argmin^,gR{x(16'u;'^ + 2w^ + + -) ■-w ^ [-1,1]}. 

This problem has infinitely many global minimizers {x*,y*) = (a, 0.5) for any a E [0.1,1] with the 
optimal value f* = 0.5. 

Example 5.6 Consider the following bilevel polynomial problem |36) 

min3;,ygR -X + xy + lOy^ 
s.t. —1 < X < 1 

y E argmin^gR{-xte^ + w‘^/2 : w E [-1,1]}. 
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This problem has a unique global minimizer (x*,y*) = (0,0) with the optimal value f* = 0. 

We solved these four problems by using the approximation scheme proposed in Section 4 imple¬ 
mented via the software GloptiPoly 3 and the SDP solver Sedumi. For detailed illustration of how the 
scheme is implemented, see Example |4.8[ The numerical results are summarized in the following table. 
Note that deg denotes the maximum degree of the polynomial underestimation used in a subproblem 
of our scheme. 


Table 2: Non-Convex Lower-Level Problems 


Test Problems 

Known optimal solutions 

Computed solutions 

Example 5.3 

ix*,y*) = (-1,-1) 

r = -i 

(x,y) = (-1.0000,-1.0000) 

/ = -1.0000 

CPU time=1.0746 
deg=12 

Example 5.4 

(x*,y*) = (-l,0) or (-1/2,-!) 

r = -2 

(x,y) = (-0.9991,-0.0020) 

/ = -2.0002 

CPU time=5.1432 
deg=14 

Example 5.5 

{x*,y*) = (a, 0.5) for all a G [0.1,1] 
f* = 0.5 

{x,y) = (0.2299,0.4990) 

/ = 0.4990 

CPU time=6.8819 
deg=12 

Example 5.6 

(x*,j/*) = (0,0) 

r = 0 

ix,y) = (0.0034,-0.0002) 

/ = -0.0034 

CPU time= 0.8844 
deg=10 


6 Conclusion and Further Research 

We established that a global minimizer and the global minimum of a bilevel polynomial optimization 
problem can be found by way of solving a sequence of semidefinite programming relaxations. We 
hrst considered a bilevel polynomial optimization problem where the lower level problem is a convex 
problem. In this case, we proved that the values of the sequence of relaxation problems converge to 
the global optimal value of the bilevel problem under a mild assumption. This shows that a global 
solution can simply be found by first transforming the bilevel problem into an equivalent single-level 
polynomial problem and then solving the resulting single-level problem by the standard sequential 
SDP relaxations used in the polynomial optimization |26j . 

We then examined a general bilevel polynomial optimization problem with a not necessarily convex 
lower-level problem. We established that the global optimal value in this case can be found by way of 
solving a new sequential semidefinite programming relaxation problems based on the joint-marginal 
approach proposed in m- This was done by using a sequence of semidefinite programming relaxations 
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of its e-approximation problem under the standard Assumption 2.1 of polynomial optimization, where 
e > 0 is smaller than a positive threshold. 

The convergence of the proposed semidefinite programming approximation scheme relies on As¬ 
sumption 2.1 which requires that the feasible set of the bilevel problem is bounded. The proposed 
scheme can also be extended to cover possible unbounded feasible sets by exploiting coercivity of the 
objective function of the upper/lower level problem as in our recent papers [21t \2^ [23] where the 
convergence of the sequence of semidefinite programming relaxations was established for polynomial 
optimization problems with unbounded feasible sets. 

Our bilevel problem, in the present paper, represents the so-called optimistic approach to the leader 
and follower’s game in which the follower is assumed to be co-operative and so, the leader can choose 
the solution with the lowest cost. The pessimistic approach assumes that the follower may not be co¬ 
operative and hence the leader will need to prepare for the worst cost. Mathematically, the following 
bilevel problem represents the pessimistic approach: 

min max f(x, 

xeR" yGY(x) 

subject to fl'i(x) <0, i = 1 ,..., s, 

where Y{x) := argmin^g]gm{G(x,tc) : hj{w) < 0,j = l,...,r}. A possible method to solving this 
bilevel problem is to construct a polynomial approximation for the optimal value of the problem x i—)• 
Tiiax.y^Y(x) y) using the joint marginal approach of [27| and then design a semidefinite programming 

approximation method that is similar to the scheme studied in the present paper. This would be an 
interesting topic for future research. 


Appendix A: Semi-algebraic functions and sets 


In this appendix, we summarize some of the important properties of semi-algebraic functions which 
are used in this paper (see 0). 

(i) Finite union (resp. intersection) of semi-algebraic sets is semi-algebraic. The Cartesian product 
(resp. complement, closure) of semi-algebraic sets is semi-algebraic. 

(ii) If /, g are semi-algebraic functions on M” and A G M, then f + g, fg and A/ are all semi-algebraic 
functions. 

(iii) If / is a semi-algebraic function on M"" and A G M, then {x : f{x) < A} (resp. {x : f{x) < A}, 
{x : f{x) < A} and {x : f{x) = A} are all semi-algebraic sets. 

(iv) A composition of semi-algebraic maps is a semi-algebraic map. 

(v) The image and inverse image of a semi-algebraic set under a semi-algebraic map are semi- 
algebraic sets. In particular, the projection of a semi-algebraic set is still a semi-algebraic set. 
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(vi) If S' is a compact semi-algebraic set in M™' and / : M” x —)• M is a real polynomial, then the 

function x i—)■ m.in{f(x,y) : y £ S}, is also semi-algebraic. 

Remark 6.1 If ^ G and S £ M"’ x M™' are semi-algebraic sets, then we see that U := 

{x £ A : {x,y) £ S, y y £ B} is also a semi-algebraic set. To see this, from property (v), we see 
that {x £ A : 3y £ B, (x, y) £ S} is semialgebraic. As the complement of U is the union of the 
complement of A and the set {x £ A : 3y £ B, {x,y) 0 S}, it follows that the complement of U is 
semi-algebraic by property (i). Thus, U is also semi-algebraic by property (i). In general, if we have a 
hnite collection of semi-algebraic sets, then any set obtained from them by a finite chain of quantifiers 
is also semi-algebraic. 

For a one-dimensional semi-algebraic function, we have further the following properties: 

Lemma 6.2 (Monotonicity Theorem |15j ) Let f be a semi-algebraic function f on M. Let a, 6 G M 
with a < b. Then, there exists a finite subdivision a = to < ti <...< tk = b such that, on each 
interval / is continuous and f either takes a constant value or is strictly monotone. 

Lemma 6.3 (Growth Dichotomy Lemma |35j) Let cq > 0 and let f be a continuous semi¬ 
algebraic function f on [0, eo] with /(O) = 0. Then either f takes a constant value 0 over [0,eo] or 

p p 

there exist constants c 0 and p,q £ N>o such that f{t) = cti -£ o{ti) as t —)• 0+. 


Appendix B: Proof of Lemma 4.2 


Proof. [Proof of (i)] From the dehnition of (Pe), it is clear that if 0 < ei < € 2 , then val(Pe^) > 
val(Pe 2 ). Using a similar method of proof as in Lemma 4.1, one can show that e 1 —)• val(Pe) is a lower 
semicontinuous function. Now, let —>■ e+. Then, from the lower semicontinuity, 

liminf val(Pej,) > val(Pe). 

k^oo 

This together with the fact that e 1 —)■ val(Pe) is nonincreasing shows that lim^^oo val(P£j,) = (Pe). So, 
this function is right continuous. 

Let J{x) := min^{G(x,rc) : hj{w) < 0,j = By property (vi), J is a semialgebraic 

function. Let 

X := {(e, X, y) G [0,-|-oo) X X M”* : gi{x,y) < 0,i = 1,..., s, 

hj{y) < 0,j = 1,.. .,r,G{x,y) - J{x) < e} 

and 

Y ■= {{e,x,y) £ X : f{x,y) < f{a,b), \f{e,a,b) £ X}. 

We can verify that X and Y are semialgebraic sets by properties (ii), (iii) and Remark |6.1[ Further, 
by definition, the graph of the function e 1 —>■ val(Pe) is given by {{e, f{x,y)) : (e, x,y) G Y}. Clearly, 
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this set is the image of the set Y under the semialgebraic map {e,x,y) i—)• {e, f{x,y)), and hence it is 
a semialgebraic set by property (v). Thus, e i—>• val(Pe) is a semi-algebraic function on [0, +oo). 


Fix a finite interval I C [0, +oo). As e i—val(Pe) is a semialgebraic function, it follows from Lemma 


6.2 that the function e i—val(Pe) is continuous over I except at finitely many points. 


[Proof of (ii)] Fix a finite interval I C [0,-|-oo). Denote the discontinuity points of e i—)• val(Pe) 
on I by {ei,..., e;} for some I G N. Clearly, infi<i<; e* > 0 as e i—val(Pe) is right continuous. Let 
e = mini<j<;{ej}/2 > 0. Then, e i—val(Pe) is continuous over [0, e]. Applying Lemma 6.3 with / 


replaced by e i—>■ val(Pe) — val(P) on [0, e], we see that there exist constants c > 0, p, g G N>o and 
eo G (0,1) with eo < e such that 


val(Pe) < val(P) + cei < val(P) + cei, for all e G [0, eo], 


( 6 . 2 ) 


where the last inequality holds as 0 < e < cq < 1. This, together with the nonincreasing property of 
e I—)■ val(Pe), yields the last assertion. 


Appendix C: Proof of Theorem 4.7 (Convergence of Algorithm 4.5) 


Proof. [Proof of (i)] Recall from Lemma 4.4 that Jk{x) < J{x) for all /c G N and for all x G D. So, 
val(Pg^) > val(Pe) for all fc G N. This implies that > val(P£) for all /c G N. As is a non-increasing 


sequence which is bounded below, lim^^oo % exists. Let = lim^^c 

> val(P£). 


Then, 


(6.3) 


Let S G (0, e) and consider problem (P^-s)- By Assumption 2.1, K and P are compact sets. From 
the nonsmooth Danskin Theorem (see [HI Page 86]), we see that the optimal value function of the 
lower level problem J{x) := min^„g]Rm{G(x, re) : hj{w) < 0,j = 1,... ,r} is locally Lipschitz (and so, 
is continuous). Thus, a global minimizer of (Pe-s) exists. Let {x,y) be a global minimizer of (Pes)- 
The set Dq := {{x,y) € K D (M” x F) : G{x,y) — J{x) < e, f{x,y) < f{x,y) -|- 5} is a nonempty set 
as {x,y) G Dq. Moreover, from our assumption we have cl(int(P H (M*^ x P))) = K D (M"' x P). This 
together with the fact that {(x, y) : G{x, y) — J(x) < e, /(x, y) < /(x, y) + <5} is an open set gives us 
that 

D := {(x, y) G int(P Pi (M” x P)) : G{x, y) — J{x) < e and /(x, y) < /(x, y) -|- 5} 

is a nonempty open set. So, D := PriP = {x G M" : (x, y) € D for some y G MF} is also a nonempty 
open set. Since converges to J in L^(D, (/ 9 )-norm, converges to J on D almost everywhere. 
Moreover, as (/?(D) < -|-oo, the classical Egorov’s theorem guarantees that there exists a subsequence 
Ik such that Ji^. converges to J (/^-almost uniformly on D. So, there exists a Borel set A with (p{A) < ^ 
with rj := <p{D) > 0 such that Ji^ —)• J uniformly over Q\A. As in the proof of Lemma 4.5, we can 
show that (D\A)nP 7 ^ 0. Let xq G (D\A)nP. Then, we have Jij^{xq) —)■ J(xo) and there exists yo £ F 
such that G(xo, yo) - J (xq) < e and /(xq, yo) < fix, y) + 5. So, for all large k, G(xo, yo) - \ (xq) < e. 
Thus, for all large k, (xo,yo) is feasible for (Pg'“) and 


vlf < val(Pj'=) < /(xo, yo) < fix, y) + S = val(P,_^) -b 5. 
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Letting A: —)• cx), we obtain that Ve = linifc_>.oo < val(Pe_ 5 ) + 6. Letting (5 —)• O"'', we see that 


Ve < lim val(P 5 ). 
S^e~ 


(6.4) 


Therefore, the inequality val(Pe) < Ve < lim val(P 5 ) follows by combining (6.3) and (6.4). To see 

S — 

the second assertion in (i), we only need to notice from Lemma |4.2[ i) that e i—?■ va^P^) is continuous 
except finitely many points over a finite interval I. 


[Proof of (ii)] From Lemma [4.2K ii), we see that there exists cq > 0 such that e i—?• val(Pe) is continuous 
over (0, eo). Thus, from (i), we have val(Pe) for all e G (0, cq). Now, fix any e G (0, cq). Let <5^ 4- 0 
as k ^ oo. Let = mini<j<fc val(P*) = val(P*'') and let {xk,yk) be a (5fc-solution of Then, 

{(xk^Uk)} P K n (M"’ X F). As K and F are compact, we see that {{xk,yk)} is a bounded sequence. 
Let {x,y) be a cluster point of {{xk,yk)}- Clearly, {x,y) G K (1 (M” x F). As Jfc < J on for all 
A; G N, Xfc G PriAT C Q, and {xk, yk) is feasible for Hence, for each k G N 


G{xk,yk) - J{xk) < G{xk,yk) - Ji^ixk) < e. 

Passing to the limit and noting that J is continuous, we get that G(x,y) — J{x) < e. So, {x,y) is 
feasible for (Pe). Finally, since —)■ va^Pg), it follows that 

f{x,y) = lim fixk,yk) < hm + 4) = val(Pj 

fc^oo k^oo 

and {x,y) is a global minimizer of (Pe). 
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